A Novel Approach: Tokenization Framework based on Sentence Structure in Indonesian Language
نویسندگان
چکیده
This study proposes a new approach in the sentence tokenization process. Sentence tokenization, which is known so far, process of breaking sentences based on spaces as separators. Space-based only generates single word tokens. In consisting five words, will produce tokens, one each. Each token. ignores loss original meaning separated words. Our proposed framework can generate one-word tokens and multi-word at same time. The carried out by extracting structure to obtain elements. element There are elements that Subject, Predicate, Object, Complement Adverbs. We extract structures using deep learning methods, where models built training datasets have been prepared before. results quite good with an F1 score 0.7 it still possible improve. similarity topic for measuring performance compared this case multiword token has better accuracy. was created Indonesian language but also use other languages dataset adjustments.
منابع مشابه
task-based language teaching in iran: a mixed study through constructing and validating a new questionnaire based on theoretical, sociocultural, and educational frameworks
جنبه های گوناگونی از زندگی در ایران را از جمله سبک زندگی، علم و امکانات فنی و تکنولوژیکی می توان کم یا بیش وارداتی در نظر گرفت. زبان انگلیسی و روش تدریس آن نیز از این قاعده مثتسنی نیست. با این حال گاهی سوال پیش می آید که آیا یک روش خاص با زیر ساخت های نظری، فرهنگی اجتماعی و آموزشی جامعه ایرانی سازگاری دارد یا خیر. این تحقیق بر اساس روش های ترکیبی انجام شده است.پرسش نامه ای نیز برای زبان آموزان ...
A Reflection on Kristeva's Approach to the Structure of Language
Reaching out to history and subject in terms of meaning variation, Kristeva could show that language cannot simply be a Saussurean sign system. Rather, she went on to delineate that language, beyond signs, is associated with a dynamic system of signification where the ''speaking subject'' is constantly involved in processing. Julia Kristeva, a French critic, psychoanalyst, theoretician, a post-...
متن کاملChinese Sentence Tokenization Using a Word Classifier
In this paper, we explore a Chinese sentence tokenizer built using a word classifier. In contrast to the state of the art conditional random field approaches, this one is simple to implement and easy to train. The work is broken down into two pieces: the sentence maximizer makes guesses over a large number of sentence tokenization candidates and scores each one. The highest scored sentence toke...
متن کاملthe relationship between language and social capital in ilami kurdish: a sociopragmatic approach
چکیده زبان به عنوان یک وسیله در ایجاد و بازسازی سرمایه اجتماعی در چند دهه گذشته مورد توجه بوده است. اگر چه درباره سرمایه اجتماعی و سازه های مربوط به آن زیاد نوشته شده است ولی خیلی کم بر روی اینکه چطور زبان می تواند باعث ایجاد اعتماد یا بی اعتمادی بشود مطالعه ای انجام شده است. این مطالعه به منظور تحقق دو هدف انجام گرفته است. اول تلاش خواهد شد تا یک گونه شناسی از واژگانی که مردم کرد زبان شهر ا...
15 صفحه اولClassifier Ensemble Framework: a Diversity Based Approach
Pattern recognition systems are widely used in a host of different fields. Due to some reasons such as lack of knowledge about a method based on which the best classifier is detected for any arbitrary problem, and thanks to significant improvement in accuracy, researchers turn to ensemble methods in almost every task of pattern recognition. Classification as a major task in pattern recognition,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Advanced Computer Science and Applications
سال: 2023
ISSN: ['2158-107X', '2156-5570']
DOI: https://doi.org/10.14569/ijacsa.2023.0140264